Skip to content

fix: improve binary install experience#38

Merged
beonde merged 2 commits intomainfrom
fix/binary-install-experience
Mar 11, 2026
Merged

fix: improve binary install experience#38
beonde merged 2 commits intomainfrom
fix/binary-install-experience

Conversation

@beonde
Copy link
Member

@beonde beonde commented Mar 11, 2026

Summary

Fixes binary install experience issues identified in the install audit.

P0 Fixes

  • Visible download output: First-run binary download now writes to stderr so users see progress even without logging configured
  • gRPC health probe: ensure_running() now verifies the gRPC server is actually accepting connections after the socket file appears (not just checking file existence)

P1 Fixes

  • Retry with backoff: Download retries 3 times with exponential backoff (1s, 2s) before failing
  • Updated test: Download error test updated to work with retry logic

Files Changed

  • capiscio_sdk/_rpc/process.py — Download visibility, health probe, retry logic
  • tests/unit/test_process.py — Updated mock for retry-aware error test

Testing

  • 14 unit tests passing (3 pre-existing failures in find_binary_* tests unrelated to this PR)
  • 350 MCP tests passing (companion PR)

- Add visible stderr output during binary download (P0)
- Add gRPC health probe after socket appears in ensure_running (P0)
- Add retry with exponential backoff for download (3 attempts) (P1)
- Update test mock for retry-aware download error handling
@github-actions
Copy link

✅ Documentation validation passed!

Unified docs will be deployed from capiscio-docs repo.

@github-actions
Copy link

✅ All checks passed! Ready for review.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Improves the first-run capiscio-core binary installation and startup experience in the Python SDK by making downloads visible to users, adding retry/backoff on download failures, and strengthening ensure_running() to verify gRPC readiness (not just socket existence).

Changes:

  • Emit download/install progress to stderr during first-run binary download.
  • Add up to 3 download attempts with exponential backoff on failure.
  • Add a gRPC channel readiness probe after the socket appears; update the unit test to be retry-aware.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
capiscio_sdk/_rpc/process.py Adds stderr progress output, download retry/backoff, and a gRPC readiness check in ensure_running().
tests/unit/test_process.py Updates the HTTP error download test setup to accommodate the new retry logic.
Comments suppressed due to low confidence (1)

tests/unit/test_process.py:210

  • This test is now described as covering retry behavior, but it only asserts that a RuntimeError is raised. It should also assert retry-specific effects (e.g., httpx.stream called 3 times and time.sleep called with the expected backoff delays) so that the new retry logic can’t regress without failing the test.
    @patch("capiscio_sdk._rpc.process.time.sleep")
    @patch("httpx.stream")
    def test_download_binary_http_error(self, mock_stream, mock_sleep):
        """Test download handles HTTP errors with retries."""
        pm = ProcessManager()
        
        with patch("capiscio_sdk._rpc.process.platform.system", return_value="Linux"):
            with patch("capiscio_sdk._rpc.process.platform.machine", return_value="x86_64"):
                with patch.object(ProcessManager, "_get_cached_binary_path") as mock_cached:
                    mock_path = MagicMock()
                    # exists() called: once before loop, then once per attempt for cleanup
                    mock_path.exists.return_value = False
                    mock_path.parent = MagicMock()
                    mock_cached.return_value = mock_path
                    
                    # Mock HTTP error
                    mock_response = MagicMock()
                    mock_response.raise_for_status.side_effect = Exception("404 Not Found")
                    mock_stream.return_value.__enter__.return_value = mock_response
                    
                    with pytest.raises(RuntimeError, match="Failed to download capiscio-core"):
                        pm._download_binary()

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +302 to +310
try:
channel = grpc.insecure_channel(addr)
grpc.channel_ready_future(channel).result(timeout=min(1.0, remaining))
channel.close()
break
except grpc.FutureTimeoutError:
pass
except Exception:
time.sleep(0.1)
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the gRPC readiness loop, the channel is only closed on the success path. If channel_ready_future(...).result(...) raises (including FutureTimeoutError), the channel stays open, which can leak file descriptors/sockets and keep piling up across retries. Consider creating the channel once per iteration and always closing it in a finally block, and add a small sleep/backoff on FutureTimeoutError to avoid a tight spin loop.

Suggested change
try:
channel = grpc.insecure_channel(addr)
grpc.channel_ready_future(channel).result(timeout=min(1.0, remaining))
channel.close()
break
except grpc.FutureTimeoutError:
pass
except Exception:
time.sleep(0.1)
channel = grpc.insecure_channel(addr)
try:
iter_remaining = deadline - time.time()
if iter_remaining <= 0:
break
grpc.channel_ready_future(channel).result(timeout=min(1.0, iter_remaining))
break
except grpc.FutureTimeoutError:
# Backoff slightly to avoid a tight spin loop while waiting for readiness
time.sleep(0.1)
except Exception:
time.sleep(0.1)
finally:
channel.close()

Copilot uses AI. Check for mistakes.
Comment on lines +302 to +304
try:
channel = grpc.insecure_channel(addr)
grpc.channel_ready_future(channel).result(timeout=min(1.0, remaining))
Copy link

Copilot AI Mar 11, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

remaining is computed once before the readiness loop, but each iteration uses result(timeout=min(1.0, remaining)). Near the end of the overall deadline, this can block for up to 1s and exceed the intended timeout. Recompute time_left = deadline - time.time() inside the loop and use that for the per-iteration result() timeout.

Suggested change
try:
channel = grpc.insecure_channel(addr)
grpc.channel_ready_future(channel).result(timeout=min(1.0, remaining))
# Recompute time left for each iteration to avoid overshooting the overall timeout
time_left = deadline - time.time()
if time_left <= 0:
break
try:
channel = grpc.insecure_channel(addr)
grpc.channel_ready_future(channel).result(timeout=min(1.0, time_left))

Copilot uses AI. Check for mistakes.
@codecov
Copy link

codecov bot commented Mar 11, 2026

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@github-actions
Copy link

✅ SDK server contract tests passed (test_server_integration.py). Cross-product scenarios are validated in capiscio-e2e-tests.

- Close channel in finally block to prevent FD leaks on exceptions
- Recompute time_left per iteration to avoid overshooting timeout
- Add backoff sleep on FutureTimeoutError to prevent tight spin
@github-actions
Copy link

✅ Documentation validation passed!

Unified docs will be deployed from capiscio-docs repo.

@github-actions
Copy link

✅ All checks passed! Ready for review.

@github-actions
Copy link

✅ SDK server contract tests passed (test_server_integration.py). Cross-product scenarios are validated in capiscio-e2e-tests.

@beonde beonde merged commit 3711dbe into main Mar 11, 2026
13 checks passed
@beonde beonde deleted the fix/binary-install-experience branch March 11, 2026 20:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants